Search CORE

144 research outputs found

Robust Learning from Bites

Author: Christmann Andreas
Publication venue
Publication date
Field of study

Many robust statistical procedures have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets. Secondly, robust confidence intervals for the estimated parameters or robust predictions according to the fitted models are often unknown. Here, we propose a general method to overcome these problems of robust estimation in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. The method additionally offers distribution-free confidence intervals for the median of the predictions. The method is illustrated for two situations: robust estimation in linear regression and kernel logistic regression from statistical machine learning. --

Research Papers in Economics

On a strategy to develop robust and simple tariffs from motor vehicle insurance data

Author: Christmann Andreas
Publication venue
Publication date
Field of study

The goals of this paper are twofold: we describe common features in data sets from motor vehicle insurance companies and we investigate a general strategy which exploits the knowledge of such features. The results of the strategy are a basis to develop insurance tariffs. The strategy is applied to a data set from motor vehicle insurance companies. We use a nonparametric approach based on a combination of kernel logistic regression and ¡support vector regression. --Classification,Data Mining,Insurance tariffs,Kernel logistic regression,Machine learning,Regression,Robustness,Simplicity,Support Vector Machine,Support Vector Regression

Research Papers in Economics

Regression depth and support vector machine

Author: Christmann Andreas
Publication venue
Publication date
Field of study

The regression depth method (RDM) proposed by Rousseeuw and Hubert [RH99] plays an important role in the area of robust regression for a continuous response variable. Christmann and Rousseeuw [CR01] showed that RDM is also useful for the case of binary regression. Vapnik?s convex risk minimization principle [Vap98] has a dominating role in statistical machine learning theory. Important special cases are the support vector machine (SVM), [epsilon]-support vector regression and kernel logistic regression. In this paper connections between these methods from different disciplines are investigated for the case of pattern recognition. Some results concerning the robustness of the SVM and other kernel based methods are given. --

Research Papers in Economics

Qualitative Robustness of Support Vector Machines

Author: Christmann Andreas
Hable Robert
Publication venue
Publication date: 31/07/2011
Field of study

Support vector machines have attracted much attention in theoretical and in applied statistics. Main topics of recent interest are consistency, learning rates and robustness. In this article, it is shown that support vector machines are qualitatively robust. Since support vector machines can be represented by a functional on the set of all probability measures, qualitative robustness is proven by showing that this functional is continuous with respect to the topology generated by weak convergence of probability measures. Combined with the existence and uniqueness of support vector machines, our results show that support vector machines are the solutions of a well-posed mathematical problem in Hadamard's sense

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Estimating conditional quantiles with the help of the pinball loss

Author: Christmann Andreas
Steinwart Ingo
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 10/02/2011
Field of study

The so-called pinball loss for estimating conditional quantiles is a well-known tool in both statistics and machine learning. So far, however, only little work has been done to quantify the efficiency of this tool for nonparametric approaches. We fill this gap by establishing inequalities that describe how close approximate pinball risk minimizers are to the corresponding conditional quantile. These inequalities, which hold under mild assumptions on the data-generating distribution, are then used to establish so-called variance bounds, which recently turned out to play an important role in the statistical analysis of (regularized) empirical risk minimization approaches. Finally, we use both types of inequalities to establish an oracle inequality for support vector machines that use the pinball loss. The resulting learning rates are min--max optimal under some standard regularity assumptions on the conditional quantile.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ267 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Crossref

On robustness properties of convex risk minimization methods for pattern recognition

Author: Christmann Andreas
Steinwart Ingo
Publication venue
Publication date
Field of study

The paper brings together methods from two disciplines: machine learning theory and robust statistics. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. A sensitivity analysis of the support vector machine is given. --AdaBoost loss function,influence function,kernel logistic regression,robustness,sensitivity curve,statistical learning,support vector machine,total variation

Research Papers in Economics

Consistency and robustness of kernel based regression

Author: Christmann Andreas
Steinwart Ingo
Publication venue
Publication date
Field of study

We investigate properties of kernel based regression (KBR) methods which are inspired by the convex risk minimization method of support vector machines. We first describe the relation between the used loss function of the KBR method and the tail of the response variable Y . We then establish a consistency result for KBR and give assumptions for the existence of the influence function. In particular, our results allow to choose the loss function and the kernel to obtain computational tractable and consistent KBR methods having bounded influence functions. Furthermore, bounds for the sensitivity curve which is a finite sample version of the influence function are developed, and some numerical experiments are discussed. --

Research Papers in Economics